fix: user login/logout sync#1558
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API_Server as API Server
participant Middleware as IntraClusterRequest
participant Utils as create_intracluster_auth_headermap
participant Remote as Live Node / Ingestor
Client->>API_Server: request to protected route (session or cluster-secret)
activate API_Server
API_Server->>Middleware: wrapped by IntraClusterRequest
activate Middleware
Middleware->>Middleware: validate CLUSTER_SECRET / extract tenant + userid
Middleware->>API_Server: inject SessionKey into req.extensions and forward
deactivate Middleware
API_Server->>Utils: create_intracluster_auth_headermap(req.headers, node_token, userid)
activate Utils
Utils->>Remote: forward request with assembled headers (Authorization / cluster-secret / tenant / intra-cluster-userid / cookies)
deactivate Utils
Remote-->>API_Server: response
deactivate API_Server
API_Server-->>Client: response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/utils/mod.rs (1)
309-337: Consider logging or handling failed sync responses.The function propagates transport errors but does not check HTTP response status codes. A node returning 4xx/5xx will be silently ignored. While this may be intentional for resilience (logout should proceed even if some nodes fail), consider logging failed sync attempts for observability.
This pattern is consistent with
login_syncabove, so if intentional, no change needed.♻️ Optional: Add response status check with logging
async move { - INTRA_CLUSTER_CLIENT + let response = INTRA_CLUSTER_CLIENT .post(url) .header(header::AUTHORIZATION, node.token) .header(header::CONTENT_TYPE, "application/json") .json(&json!( { "sessionKey": _session } )) .send() .await?; + if !response.status().is_success() { + tracing::warn!("logout_sync failed for node: status {}", response.status()); + } Ok::<(), anyhow::Error>(()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/mod.rs` around lines 309 - 337, The logout_sync loop currently sends requests via INTRA_CLUSTER_CLIENT inside for_each_live_node but does not inspect HTTP response status, so 4xx/5xx from nodes are ignored; update the async block in logout_sync to capture the response from .send().await, check response.status().is_success(), and when not successful use the project's logging (e.g., tracing::error! or the same logger used elsewhere) to record the node.identity/domain and status/body to aid observability; ensure the function still returns Ok(()) on per-node failures if you want to preserve resilience (i.e., log and continue rather than propagating an error), matching the behavior of login_sync if that pattern is intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/mod.rs`:
- Around line 256-263: The loop over request cookies currently calls
map.insert(...) which overwrites the COOKIE header so only the last cookie
remains; change the call to map.append(reqwest::header::COOKIE,
reqwest::header::HeaderValue::from_bytes(cookie.as_bytes()).unwrap()) so each
iteration adds a cookie value instead of replacing the previous one (update the
code where req.get_all(...), cookie, and map are used).
---
Nitpick comments:
In `@src/utils/mod.rs`:
- Around line 309-337: The logout_sync loop currently sends requests via
INTRA_CLUSTER_CLIENT inside for_each_live_node but does not inspect HTTP
response status, so 4xx/5xx from nodes are ignored; update the async block in
logout_sync to capture the response from .send().await, check
response.status().is_success(), and when not successful use the project's
logging (e.g., tracing::error! or the same logger used elsewhere) to record the
node.identity/domain and status/body to aid observability; ensure the function
still returns Ok(()) on per-node failures if you want to preserve resilience
(i.e., log and continue rather than propagating an error), matching the behavior
of login_sync if that pattern is intended.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/cli.rssrc/handlers/http/oidc.rssrc/rbac/user.rssrc/utils/mod.rs
💤 Files with no reviewable changes (1)
- src/rbac/user.rs
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
src/utils/mod.rs (1)
256-261: Hashing the cluster secret on every request is expensive.
gen_hash()uses Argon2 with a random salt, which is computationally expensive and produces a different hash each time. The receiving middleware verifies usingPasswordHash::new(...).verify_password(...), which works correctly, but generating a fresh Argon2 hash per request adds significant latency.Consider caching the hashed secret or using a simpler HMAC-based approach for intra-cluster authentication if performance becomes a concern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/mod.rs` around lines 256 - 261, The code currently calls gen_hash(secret) inside the request path (in the block using CLUSTER_SECRET.get() and map.insert with CLUSTER_SECRET_HEADER), which re-runs Argon2 with a random salt per request; instead compute and cache the hashed value once and reuse it. Add a lazily-initialized static (e.g., OnceCell/Mutex or similar) to store the precomputed hashed secret when CLUSTER_SECRET is set, change the request code to read the cached hash instead of calling gen_hash(), and keep the existing verification code (PasswordHash::new(...).verify_password(...)) unchanged so the receiver continues to verify against the cached hash. Ensure the cache is populated whenever CLUSTER_SECRET changes or on startup to avoid recomputing during request handling.src/handlers/http/middleware.rs (1)
547-599: Intra-cluster session cleanup may leak sessions.The middleware creates a temporary session with a 20-second TTL (line 576) for each intra-cluster request. If requests complete quickly, these sessions accumulate until they expire. While the short TTL mitigates this, high request volumes could create many orphan sessions.
Consider cleaning up the session after the request completes, or document that session expiry handles cleanup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/middleware.rs` around lines 547 - 599, The middleware currently creates a temporary session (SessionKey::SessionId via Ulid::new()) and registers it with Users.new_session(..., TimeDelta::seconds(20)) inside call(&self, mut req), which can leak many short-lived sessions; modify call to record the created session (from req.extensions_mut() or the local session variable) and, after awaiting the inner future (the fut.await in the boxed async), explicitly remove/close that session (e.g. Users.end_session(session) or Users.delete_session(session)) and remove the extension/header so the temporary session is cleaned up on request completion; ensure you only attempt cleanup when a session was actually created (the same condition that led to calling Users.new_session).src/alerts/alert_types.rs (1)
103-106: Performance concern: Argon2 hashing on every intra-cluster request adds unnecessary latency.While the current implementation correctly uses
Argon2::verify_password()on the server-side (src/handlers/http/middleware.rs:557-560), callinggen_hash(secret)on every alert evaluation is inefficient. Argon2 is intentionally slow, designed for password storage rather than request authentication. This compounds across high-frequency intra-cluster calls.Consider using a faster authentication mechanism for intra-cluster communication, such as HMAC-SHA256, to reduce per-request overhead while maintaining security.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/alerts/alert_types.rs` around lines 103 - 106, Replace the per-request Argon2 hash generation (gen_hash) used when inserting the CLUSTER_SECRET_HEADER with a fast MAC approach: compute an HMAC-SHA256 of the shared secret once (or cache it) and use HeaderValue::from_str(&hex_or_base64_hmac) when building headers (where HeaderName::from_static(CLUSTER_SECRET_HEADER) is set), and update the server-side check to verify the HMAC (constant-time compare) instead of calling Argon2::verify_password; this removes the expensive gen_hash call from alert evaluation and centralizes the lightweight HMAC computation/verification logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/handlers/http/cluster/mod.rs`:
- Line 727: Replace the panic-prone call to get_user_from_request(req).unwrap()
with safe error handling: call get_user_from_request(req) and match or use the ?
operator to return an appropriate error instead of unwrapping (e.g., propagate a
RoleError::Unauthorized or add that variant to RoleError if missing); update the
surrounding function to return Result<..., RoleError> (or map into an existing
RoleError variant) so invalid/expired sessions produce a handled Unauthorized
error rather than panicking in cluster::mod.rs where get_user_from_request is
used.
In `@src/handlers/http/middleware.rs`:
- Around line 557-561: The code currently calls
PasswordHash::new(incoming_secret).unwrap(), which can panic on malformed PHC
strings; change this to gracefully handle parse errors by replacing the unwrap
with proper error handling (e.g., match or if let Err) around
PasswordHash::new(incoming_secret) and on Err log/return an authentication
failure instead of panicking; keep the subsequent
Argon2::default().verify_password(secret.as_bytes(), &parsed_hash) path
unchanged for the Ok(parsed_hash) case so malformed incoming_secret no longer
crashes the server.
In `@src/rbac/map.rs`:
- Around line 307-322: In remove_all_expired_sessions, the retain closure is
inverted: it currently returns true for expired sessions and false for valid
ones, so valid sessions are removed while expired ones are kept; change the
closure in user_sessions.iter_mut() so expired sessions call
self.active_sessions.remove(s) and return false (to drop them) and valid
sessions return true (to keep them), i.e. swap the boolean branches in the
retain for the user_sessions and ensure the call to active_sessions.remove(s)
remains in the expired-branch; reference symbols: remove_all_expired_sessions,
user_sessions, active_sessions, sessions.retain.
In `@src/storage/field_stats.rs`:
- Around line 576-590: The code currently calls
get_user_from_request(&req).unwrap() inside the
create_intracluster_auth_headermap call which can panic on missing/invalid
session; update the logic in the block that builds the HeaderMap (around
create_intracluster_auth_headermap usage) to handle the Result/Option returned
by get_user_from_request safely (e.g., match or if let), bail out or return None
/ propagate an error when there is no user instead of unwrapping, and only call
create_intracluster_auth_headermap when a valid user is present; ensure you keep
the HeaderMap construction (HeaderMap::new, HeaderName::from_bytes,
HeaderValue::from_bytes, map.insert) the same but guarded by the successful user
extraction.
---
Nitpick comments:
In `@src/alerts/alert_types.rs`:
- Around line 103-106: Replace the per-request Argon2 hash generation (gen_hash)
used when inserting the CLUSTER_SECRET_HEADER with a fast MAC approach: compute
an HMAC-SHA256 of the shared secret once (or cache it) and use
HeaderValue::from_str(&hex_or_base64_hmac) when building headers (where
HeaderName::from_static(CLUSTER_SECRET_HEADER) is set), and update the
server-side check to verify the HMAC (constant-time compare) instead of calling
Argon2::verify_password; this removes the expensive gen_hash call from alert
evaluation and centralizes the lightweight HMAC computation/verification logic.
In `@src/handlers/http/middleware.rs`:
- Around line 547-599: The middleware currently creates a temporary session
(SessionKey::SessionId via Ulid::new()) and registers it with
Users.new_session(..., TimeDelta::seconds(20)) inside call(&self, mut req),
which can leak many short-lived sessions; modify call to record the created
session (from req.extensions_mut() or the local session variable) and, after
awaiting the inner future (the fut.await in the boxed async), explicitly
remove/close that session (e.g. Users.end_session(session) or
Users.delete_session(session)) and remove the extension/header so the temporary
session is cleaned up on request completion; ensure you only attempt cleanup
when a session was actually created (the same condition that led to calling
Users.new_session).
In `@src/utils/mod.rs`:
- Around line 256-261: The code currently calls gen_hash(secret) inside the
request path (in the block using CLUSTER_SECRET.get() and map.insert with
CLUSTER_SECRET_HEADER), which re-runs Argon2 with a random salt per request;
instead compute and cache the hashed value once and reuse it. Add a
lazily-initialized static (e.g., OnceCell/Mutex or similar) to store the
precomputed hashed secret when CLUSTER_SECRET is set, change the request code to
read the cached hash instead of calling gen_hash(), and keep the existing
verification code (PasswordHash::new(...).verify_password(...)) unchanged so the
receiver continues to verify against the cached hash. Ensure the cache is
populated whenever CLUSTER_SECRET changes or on startup to avoid recomputing
during request handling.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
src/alerts/alert_types.rssrc/handlers/http/cluster/mod.rssrc/handlers/http/ingest.rssrc/handlers/http/logstream.rssrc/handlers/http/middleware.rssrc/handlers/http/modal/ingest_server.rssrc/handlers/http/modal/query/querier_role.rssrc/handlers/http/modal/server.rssrc/handlers/http/rbac.rssrc/rbac/map.rssrc/rbac/mod.rssrc/rbac/user.rssrc/storage/field_stats.rssrc/utils/actix.rssrc/utils/mod.rssrc/validator.rs
041854d to
4a423fb
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/handlers/http/middleware.rs (1)
451-452:⚠️ Potential issue | 🔴 CriticalRoute parameter mismatch:
auth_user_contextstill looks for "username".The routes have been renamed from
{username}to{userid}, butauth_user_contextat line 451 still callsreq.match_info().get("username"). This will cause authorization to fail for user-context routes.🐛 Proposed fix
pub fn auth_user_context( req: &mut ServiceRequest, action: Action, ) -> Result<rbac::Response, Error> { // check if tenant is suspended if let rbac::Response::Suspended(msg) = check_suspension(req.request(), action) { return Ok(rbac::Response::Suspended(msg)); } let creds = extract_session_key(req); - let user = req.match_info().get("username"); + let user = req.match_info().get("userid"); creds.map(|key| Users.authorize(key, action, None, user)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/middleware.rs` around lines 451 - 452, In auth_user_context update the route param lookup from req.match_info().get("username") to req.match_info().get("userid") so the user context matches the renamed routes; locate the auth_user_context invocation (the let user = ... line) and change the key passed to req.match_info().get to "userid" (keeping the rest of the call to Users.authorize(key, action, None, user) unchanged).
♻️ Duplicate comments (5)
src/utils/mod.rs (1)
270-277:⚠️ Potential issue | 🟠 MajorBug: Only the last cookie is retained due to
insert()overwriting.The loop iterates over all cookies but
insert()replaces the previous value for the same key. Only the last cookie will be preserved. The author indicated this was "fixed in next commit" but the code still usesinsert().🐛 Proposed fix
} else if req.contains_key(actix_web::http::header::COOKIE) { // multiple cookies for cookie in req.get_all(actix_web::http::header::COOKIE) { - map.insert( + map.append( reqwest::header::COOKIE, reqwest::header::HeaderValue::from_bytes(cookie.as_bytes()).unwrap(), ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/mod.rs` around lines 270 - 277, The loop over req.get_all(...) currently calls map.insert(...) which overwrites previous cookie headers so only the last cookie remains; change it to accumulate all cookies instead of inserting repeatedly—either build a single cookie string by joining each cookie.as_str() with "; " and then insert one HeaderValue, or call map.append(reqwest::header::COOKIE, HeaderValue::from_bytes(...).unwrap()) for each cookie; update the code around req.get_all(...), req, map, and the HeaderValue creation to use append or a single combined HeaderValue so all cookies are sent.src/rbac/map.rs (1)
307-322:⚠️ Potential issue | 🔴 CriticalCritical:
retainlogic is inverted — removes valid sessions, keeps expired ones.This issue was previously flagged. The
retainclosure returnstrueto keep an element:
- When
now > *t(expired): returnstrue→ keeps expired session- When
now <= *t(valid): returnsfalse→ removes valid sessionThe boolean returns need to be swapped.
🐛 Proposed fix
pub fn remove_all_expired_sessions(&mut self, tenant_id: &str) { let now = Utc::now(); if let Some(user_sessions) = self.user_sessions.get_mut(tenant_id) { for (_, sessions) in user_sessions.iter_mut() { sessions.retain(|(s, t)| { if now > *t { self.active_sessions.remove(s); - true + false // expired: remove from user_sessions } else { - false + true // valid: keep in user_sessions } }); } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/rbac/map.rs` around lines 307 - 322, In remove_all_expired_sessions, the sessions.retain closure is inverted: change it so it returns true to keep valid sessions and false to drop expired ones — i.e., if now > *t (expired) then remove the session id from self.active_sessions and return false, otherwise return true to keep the tuple; update the closure in the user_sessions.iter_mut() block (sessions.retain) accordingly so active_sessions is cleaned only when a session is dropped.src/storage/field_stats.rs (1)
576-590:⚠️ Potential issue | 🟠 MajorPotential panic on missing user session.
get_user_from_request(&req).unwrap()at line 579 will panic if session key extraction fails or the user isn't found in the session map. This could cause request failures in edge cases (e.g., expired sessions, race conditions during logout).🛡️ Proposed fix to handle the error gracefully
} else { - let auth = create_intracluster_auth_headermap( - req.headers(), - "", - &get_user_from_request(&req).unwrap(), - ); + let userid = match get_user_from_request(&req) { + Ok(id) => id, + Err(e) => { + error!("Failed to get user from request: {:?}", e); + return Err(QueryError::Unauthorized); + } + }; + let auth = create_intracluster_auth_headermap( + req.headers(), + "", + &userid, + ); let mut map = HeaderMap::new();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage/field_stats.rs` around lines 576 - 590, The code calls get_user_from_request(&req).unwrap() inside the create_intracluster_auth_headermap call which will panic on missing/invalid session; replace the unwrap with proper error handling: call get_user_from_request(&req) and match or if-let on its Option/Result, and if it returns None/Err either return an appropriate error response or skip creating auth headers (e.g., return None or propagate a descriptive error), then proceed to build the HeaderMap from the resulting auth iterator (create_intracluster_auth_headermap) only when a valid user exists; update any callers of this block to handle the new None/Error case accordingly so no panic can occur.src/handlers/http/middleware.rs (1)
547-589:⚠️ Potential issue | 🟠 MajorPotential panic on malformed hash input.
PasswordHash::new(incoming_secret).unwrap()at line 557 will panic if the incoming header contains a malformed PHC string. A malicious or buggy client could crash the server.🛡️ Proposed fix to handle parse errors gracefully
{ // validate the incoming header value - let parsed_hash = PasswordHash::new(incoming_secret).unwrap(); - if Argon2::default() - .verify_password(hash.as_bytes(), &parsed_hash) - .is_ok() + if let Ok(parsed_hash) = PasswordHash::new(incoming_secret) + && Argon2::default() + .verify_password(hash.as_bytes(), &parsed_hash) + .is_ok() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/middleware.rs` around lines 547 - 589, In call(&self, mut req: ServiceRequest) replace the panic-prone PasswordHash::new(incoming_secret).unwrap() with safe parsing (e.g. match or if let Ok(parsed_hash) = PasswordHash::new(incoming_secret)) and handle the Err case by returning the same Some(...) error path (or a specific validation error) so malformed PHC strings don't panic; keep the subsequent Argon2::default().verify_password(...) using the successfully parsed parsed_hash and preserve existing behavior for session creation (referencing CLUSTER_SECRET, CLUSTER_SECRET_HEADER, PasswordHash::new, Argon2::default().verify_password, Users.get_user, Ulid, SessionKey::SessionId).src/handlers/http/cluster/mod.rs (1)
721-743:⚠️ Potential issue | 🟠 MajorPotential panic on missing user session.
get_user_from_request(req).unwrap()at line 727 will panic if the session is invalid or expired.🛡️ Proposed fix to handle errors gracefully
pub async fn sync_role_update( req: &HttpRequest, name: String, role: Role, tenant_id: &Option<String>, ) -> Result<(), RoleError> { let tenant = tenant_id.to_owned(); - let userid = get_user_from_request(req).unwrap(); + let userid = get_user_from_request(req).map_err(|e| { + error!("Failed to get user from request for role sync: {:?}", e); + RoleError::Anyhow(anyhow::anyhow!("Unauthorized: failed to get user from request")) + })?; let headers = req.headers().clone();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/cluster/mod.rs` around lines 721 - 743, The code currently calls get_user_from_request(req).unwrap() which can panic on missing/expired sessions; update the cluster sync handler to handle the missing user gracefully by matching the result of get_user_from_request(req) (or using ? if it returns Result) and returning an appropriate Err(RoleError) instead of unwrapping; keep the userid variable (e.g., userid or user_id) for downstream use, and ensure the async closure that references userid uses the handled value (clone if needed) so no unwrap is left in the path.
🧹 Nitpick comments (3)
src/rbac/user.rs (2)
168-168: Consider documenting the reason for public visibility.The
gen_hashfunction is nowpub. While this enables external usage, consider adding a brief doc comment explaining why public access is needed, to help future maintainers understand the design decision.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/rbac/user.rs` at line 168, gen_hash was made public; add a concise doc comment above the pub fn gen_hash(password: &str) -> String explaining why external code needs to call it (e.g., to allow other crates/modules to generate/verify password hashes for authentication, testing, or migration), and note any security considerations/expectations (inputs should be plaintext passwords, output is a salted hash, callers must handle secrets securely) so future maintainers understand the design decision.
368-376: Internal roles reported as "non-existent" may be misleading.The validation now correctly prevents adding internal roles to groups (per the comment on line 368). However, when an internal role is rejected, it's added to
non_existent_roles, which may confuse users since the role does exist—it's just not assignable to groups.Consider using a separate error category like
non_assignable_rolesto distinguish between truly non-existent roles and roles that exist but cannot be assigned to groups.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/rbac/user.rs` around lines 368 - 376, The current validation loop on roles() that checks self.roles pushes internal (existing-but-not-assignable) roles into non_existent_roles, which is misleading; introduce a separate collection (e.g., non_assignable_roles) and, inside the loop where you detect role.role_type() != RoleType::User, push the role into non_assignable_roles instead of non_existent_roles; update any downstream code that reports or returns non_existent_roles to also include non_assignable_roles (or adjust error construction) so callers can distinguish truly missing roles from existing internal roles; reference the existing symbols roles(), self.roles, non_existent_roles and ensure error/return logic is updated accordingly.src/handlers/http/middleware.rs (1)
568-577: Session cleanup consideration for short-lived intra-cluster sessions.The session created with
TimeDelta::seconds(20)is intentionally short-lived, which is good. However, consider whether there's a mechanism to clean up these sessions after they expire, or if they accumulate until a periodic cleanup runs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/middleware.rs` around lines 568 - 577, The short-lived session created via Users.new_session(&user, SessionKey::SessionId(id), TimeDelta::seconds(20)) may accumulate if there is no expiration/cleanup path; ensure sessions are removed after expiry by either (a) using a session store that enforces TTL on insert (so new_session sets a TTL of 20s) or (b) adding an explicit expiration handler: record expiry time when creating the SessionKey in req.extensions (or in the user/session store) and register it with the existing periodic cleanup routine or spawn a lightweight timeout task to call Users.delete_session/remove_session when expired; update Users.get_user/new_session/delete_session (or the session store implementation) to support TTL-aware removal so short-lived intra-cluster sessions are cleaned up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/handlers/http/middleware.rs`:
- Around line 508-511: The doc comment above the IntraClusterRequest middleware
is incorrect (it describes DisAllowRootUser); update the comment to accurately
describe IntraClusterRequest behavior: that it validates/authorizes internal
cluster requests (e.g., checks a shared secret/header or verifies source is
in-cluster) and documents expected headers, error responses, and purpose. Locate
the comment immediately above the IntraClusterRequest struct/impl in
middleware.rs and replace the copy-pasted DisAllowRootUser text with a concise
description of IntraClusterRequest's responsibilities, inputs, and failure
behavior.
In `@src/rbac/mod.rs`:
- Around line 227-229: The session cleanup call
mut_sessions().remove_all_expired_sessions(tenant) depends on a buggy
implementation with inverted retain logic; either fix the underlying function
remove_all_expired_sessions in src/rbac/map.rs to correctly retain non-expired
sessions (i.e., flip the predicate so expired sessions are removed and valid
ones kept) or replace this call here with explicit, correct cleanup logic that
iterates sessions and removes only those with expiry <= now(); update or add
unit tests around remove_all_expired_sessions (and any callers) to assert valid
sessions remain and only expired ones are removed.
---
Outside diff comments:
In `@src/handlers/http/middleware.rs`:
- Around line 451-452: In auth_user_context update the route param lookup from
req.match_info().get("username") to req.match_info().get("userid") so the user
context matches the renamed routes; locate the auth_user_context invocation (the
let user = ... line) and change the key passed to req.match_info().get to
"userid" (keeping the rest of the call to Users.authorize(key, action, None,
user) unchanged).
---
Duplicate comments:
In `@src/handlers/http/cluster/mod.rs`:
- Around line 721-743: The code currently calls
get_user_from_request(req).unwrap() which can panic on missing/expired sessions;
update the cluster sync handler to handle the missing user gracefully by
matching the result of get_user_from_request(req) (or using ? if it returns
Result) and returning an appropriate Err(RoleError) instead of unwrapping; keep
the userid variable (e.g., userid or user_id) for downstream use, and ensure the
async closure that references userid uses the handled value (clone if needed) so
no unwrap is left in the path.
In `@src/handlers/http/middleware.rs`:
- Around line 547-589: In call(&self, mut req: ServiceRequest) replace the
panic-prone PasswordHash::new(incoming_secret).unwrap() with safe parsing (e.g.
match or if let Ok(parsed_hash) = PasswordHash::new(incoming_secret)) and handle
the Err case by returning the same Some(...) error path (or a specific
validation error) so malformed PHC strings don't panic; keep the subsequent
Argon2::default().verify_password(...) using the successfully parsed parsed_hash
and preserve existing behavior for session creation (referencing CLUSTER_SECRET,
CLUSTER_SECRET_HEADER, PasswordHash::new, Argon2::default().verify_password,
Users.get_user, Ulid, SessionKey::SessionId).
In `@src/rbac/map.rs`:
- Around line 307-322: In remove_all_expired_sessions, the sessions.retain
closure is inverted: change it so it returns true to keep valid sessions and
false to drop expired ones — i.e., if now > *t (expired) then remove the session
id from self.active_sessions and return false, otherwise return true to keep the
tuple; update the closure in the user_sessions.iter_mut() block
(sessions.retain) accordingly so active_sessions is cleaned only when a session
is dropped.
In `@src/storage/field_stats.rs`:
- Around line 576-590: The code calls get_user_from_request(&req).unwrap()
inside the create_intracluster_auth_headermap call which will panic on
missing/invalid session; replace the unwrap with proper error handling: call
get_user_from_request(&req) and match or if-let on its Option/Result, and if it
returns None/Err either return an appropriate error response or skip creating
auth headers (e.g., return None or propagate a descriptive error), then proceed
to build the HeaderMap from the resulting auth iterator
(create_intracluster_auth_headermap) only when a valid user exists; update any
callers of this block to handle the new None/Error case accordingly so no panic
can occur.
In `@src/utils/mod.rs`:
- Around line 270-277: The loop over req.get_all(...) currently calls
map.insert(...) which overwrites previous cookie headers so only the last cookie
remains; change it to accumulate all cookies instead of inserting
repeatedly—either build a single cookie string by joining each cookie.as_str()
with "; " and then insert one HeaderValue, or call
map.append(reqwest::header::COOKIE, HeaderValue::from_bytes(...).unwrap()) for
each cookie; update the code around req.get_all(...), req, map, and the
HeaderValue creation to use append or a single combined HeaderValue so all
cookies are sent.
---
Nitpick comments:
In `@src/handlers/http/middleware.rs`:
- Around line 568-577: The short-lived session created via
Users.new_session(&user, SessionKey::SessionId(id), TimeDelta::seconds(20)) may
accumulate if there is no expiration/cleanup path; ensure sessions are removed
after expiry by either (a) using a session store that enforces TTL on insert (so
new_session sets a TTL of 20s) or (b) adding an explicit expiration handler:
record expiry time when creating the SessionKey in req.extensions (or in the
user/session store) and register it with the existing periodic cleanup routine
or spawn a lightweight timeout task to call Users.delete_session/remove_session
when expired; update Users.get_user/new_session/delete_session (or the session
store implementation) to support TTL-aware removal so short-lived intra-cluster
sessions are cleaned up.
In `@src/rbac/user.rs`:
- Line 168: gen_hash was made public; add a concise doc comment above the pub fn
gen_hash(password: &str) -> String explaining why external code needs to call it
(e.g., to allow other crates/modules to generate/verify password hashes for
authentication, testing, or migration), and note any security
considerations/expectations (inputs should be plaintext passwords, output is a
salted hash, callers must handle secrets securely) so future maintainers
understand the design decision.
- Around line 368-376: The current validation loop on roles() that checks
self.roles pushes internal (existing-but-not-assignable) roles into
non_existent_roles, which is misleading; introduce a separate collection (e.g.,
non_assignable_roles) and, inside the loop where you detect role.role_type() !=
RoleType::User, push the role into non_assignable_roles instead of
non_existent_roles; update any downstream code that reports or returns
non_existent_roles to also include non_assignable_roles (or adjust error
construction) so callers can distinguish truly missing roles from existing
internal roles; reference the existing symbols roles(), self.roles,
non_existent_roles and ensure error/return logic is updated accordingly.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
src/alerts/alert_types.rssrc/handlers/http/cluster/mod.rssrc/handlers/http/ingest.rssrc/handlers/http/logstream.rssrc/handlers/http/middleware.rssrc/handlers/http/modal/ingest_server.rssrc/handlers/http/modal/query/querier_role.rssrc/handlers/http/modal/server.rssrc/handlers/http/oidc.rssrc/handlers/http/rbac.rssrc/rbac/map.rssrc/rbac/mod.rssrc/rbac/user.rssrc/storage/field_stats.rssrc/utils/actix.rssrc/utils/mod.rssrc/validator.rs
💤 Files with no reviewable changes (1)
- src/handlers/http/oidc.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- src/handlers/http/modal/ingest_server.rs
- src/validator.rs
- src/handlers/http/ingest.rs
- src/handlers/http/modal/query/querier_role.rs
- src/handlers/http/logstream.rs
3993092 to
d194fae
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/handlers/http/middleware.rs (2)
508-511:⚠️ Potential issue | 🟡 MinorDoc comment above
IntraClusterRequestis inaccurate.Lines 508-511 still describe root-user protection behavior, not intra-cluster request validation.
📝 Proposed fix
-// The credentials set in the env vars (P_USERNAME & P_PASSWORD) are treated -// as root credentials. Any other user is not allowed to modify or delete -// the root user. Deny request if username is same as username -// from env variable P_USERNAME. +/// Middleware that validates intra-cluster requests using `x-p-cluster-secret` +/// and short-lived user session injection (`intra-cluster-tenant` / `intra-cluster-userid`). +/// Rejects requests when required headers are missing or secret verification fails. pub struct IntraClusterRequest;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/middleware.rs` around lines 508 - 511, The doc comment above the IntraClusterRequest type is incorrect (it describes root-user protection) — update the documentation for the IntraClusterRequest struct/enum/impl to explain that it validates and represents intra-cluster requests (e.g., checks the X-Cluster-Token or equivalent header/secret, ensures requests originate from trusted cluster peers, and how validation failure is handled), and remove or replace any text about P_USERNAME/P_PASSWORD or root-user modification; locate the IntraClusterRequest definition and its associated methods (e.g., validate/from_request) and rewrite the comment to accurately describe its purpose and behavior.
557-560:⚠️ Potential issue | 🟠 MajorAvoid panic on malformed intra-cluster secret header.
Line 557 uses
unwrap()onPasswordHash::new(...); malformed header input can panic the worker instead of returning auth failure.🐛 Proposed fix
- let parsed_hash = PasswordHash::new(incoming_secret).unwrap(); - if Argon2::default() - .verify_password(secret.as_bytes(), &parsed_hash) - .is_ok() + if let Ok(parsed_hash) = PasswordHash::new(incoming_secret) + && Argon2::default() + .verify_password(secret.as_bytes(), &parsed_hash) + .is_ok() {#!/bin/bash # Verify no panic-prone PasswordHash parsing remains. rg -nP 'PasswordHash::new\([^)]*\)\.unwrap\(\)' --type rust🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/handlers/http/middleware.rs` around lines 557 - 560, Replace the panic-prone unwrap on PasswordHash::new(incoming_secret) with proper error handling: call PasswordHash::new(incoming_secret) and if it returns Err treat the header as an authentication failure (return Err or continue the auth-failure path), log or debug the parse error if desired, otherwise proceed to call Argon2::default().verify_password(secret.as_bytes(), &parsed_hash) as before; update the code around the parsed_hash/verify_password usage (referencing PasswordHash::new, parsed_hash, verify_password, incoming_secret, and secret) so malformed headers do not cause a panic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/handlers/http/middleware.rs`:
- Around line 508-511: The doc comment above the IntraClusterRequest type is
incorrect (it describes root-user protection) — update the documentation for the
IntraClusterRequest struct/enum/impl to explain that it validates and represents
intra-cluster requests (e.g., checks the X-Cluster-Token or equivalent
header/secret, ensures requests originate from trusted cluster peers, and how
validation failure is handled), and remove or replace any text about
P_USERNAME/P_PASSWORD or root-user modification; locate the IntraClusterRequest
definition and its associated methods (e.g., validate/from_request) and rewrite
the comment to accurately describe its purpose and behavior.
- Around line 557-560: Replace the panic-prone unwrap on
PasswordHash::new(incoming_secret) with proper error handling: call
PasswordHash::new(incoming_secret) and if it returns Err treat the header as an
authentication failure (return Err or continue the auth-failure path), log or
debug the parse error if desired, otherwise proceed to call
Argon2::default().verify_password(secret.as_bytes(), &parsed_hash) as before;
update the code around the parsed_hash/verify_password usage (referencing
PasswordHash::new, parsed_hash, verify_password, incoming_secret, and secret) so
malformed headers do not cause a panic.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/handlers/http/cluster/mod.rs (1)
526-527:⚠️ Potential issue | 🔴 CriticalReplace panic-prone
get_user_from_request(...).unwrap()in sync paths.Line 526, Line 577, Line 686, Line 734, and Line 783 can panic on invalid/expired/missing session context. These are request-path panics and should return typed errors.
Suggested fix pattern
- let caller_userid = get_user_from_request(req).unwrap(); + let caller_userid = get_user_from_request(req)?;For functions returning
Result<(), RoleError>:- let userid = get_user_from_request(req).unwrap(); + let userid = get_user_from_request(req) + .map_err(|e| RoleError::Anyhow(anyhow::anyhow!("failed to resolve caller userid: {e}")))?;#!/bin/bash # Verify all panic-prone user extraction callsites in cluster sync flows. rg -nP 'get_user_from_request\([^)]*\)\.unwrap\(\)' src/handlers/http/cluster/mod.rsAlso applies to: 577-578, 686-687, 734-735, 783-784
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/handlers/http/cluster/mod.rs`:
- Around line 1981-1989: The code deriving "intra-cluster-userid" currently uses
multiple unwrap()s and can panic; replace the unwraps in the block that checks
map.get("intra-cluster-userid") by validating each step: split querier.token()
safely (check token.len() > 1), attempt BASE64_STANDARD.decode(...) and handle
the Result, convert to UTF-8 with String::from_utf8(...) handling the Err,
ensure split_once(':') returns Some, and only then call
HeaderValue::from_str(...) (handle its Err) and insert the header; on any
failure return an appropriate error or early-exit path instead of panicking.
Ensure you reference querier.token(), BASE64_STANDARD.decode, String::from_utf8,
split_once, HeaderValue::from_str and HeaderName::from_static while implementing
the checks.
In `@src/handlers/http/modal/ingest/ingestor_role.rs`:
- Around line 97-98: The comment above the DELETE handler in
src/handlers/http/modal/ingest/ingestor_role.rs is stale (it currently reads
"Handler for PUT /api/v1/role/{name}" and mentions create/update). Update the
comment that precedes the delete handler function (the DELETE role handler in
this file) to correctly state it handles DELETE /api/v1/role/{name} and that it
deletes an existing role (or similar wording), ensuring it matches the handler's
behavior and function name (the DELETE role handler in ingestor_role.rs).
- Around line 99-125: The delete handler removes the role from metadata
(metadata.roles.remove) and from the in-memory cache
(mut_roles().entry(...).remove(&name)) but doesn't invalidate active sessions;
add a call after storage::put_staging_metadata and the mut_roles removal to
invalidate sessions for any user/group that had this role—e.g., invoke the
session manager or session store method (such as
session::invalidate_sessions_for_role or SessionManager::invalidate_by_role)
using the resolved tenant_id and role name so all affected sessions are cleared
immediately. Ensure the new call is awaited if async and handle/report errors
via RoleError consistent with the function's error handling.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/handlers/http/cluster/mod.rssrc/handlers/http/modal/ingest/ingestor_rbac.rssrc/handlers/http/modal/ingest/ingestor_role.rssrc/handlers/http/modal/ingest_server.rssrc/handlers/http/modal/query/querier_rbac.rssrc/handlers/http/modal/query/querier_role.rssrc/handlers/http/modal/query_server.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/handlers/http/modal/ingest_server.rs
0677cb9 to
ee37e68
Compare
Fixes #XXXX.
Description
This PR has:
Summary by CodeRabbit
New Features
Improvements
Chores